regret and constraint violation
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Pennsylvania (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Michigan (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)
ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation
We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
ProvablyEfficientModel-FreeConstrainedRLwith LinearFunctionApproximation
We study the constrained reinforcement learning problem, in which an agent aims tomaximize the expected cumulativereward subject toaconstraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a'simulator', we aim to develop thefirst model-free, simulator-freealgorithm that achieves a sublinear regret and a sublinear constraint violation even inlarge-scale systems.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
SimpleandFastAlgorithmforBinaryIntegerand OnlineLinearProgramming
Our algorithm employsonecolumn forsubgradient descent ineach iteration, whereas thedual project subgradient algorithm requires the whole constraint matrix and conducts matrix multiplication in each iteration. In addition, a class of backpressure/max-weight algorithms [25] are developed in the control/queueing literature and the backpressure algorithm can be interpreted from a view of pressuregradient.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- Asia > Middle East > Jordan (0.04)